Goto

Collaborating Authors

 adjacency matrix





Appendix Table of Contents

Neural Information Processing Systems

There are several key limitations of the MADE algorithm: 1. As mentioned in Section 3.1, the MADE algorithm can only mask neural networks such that they respect the autoregressive property. The non-deterministic MADE masking algorithm presented in Germain et al. [2015], the resulting Proposition 1 formalizes this point. In Section 3.1, we showed that finding the weight masks for each neural network layer is equivalent Figure 7 provides a visual example of the steps performed by Algorithm 1. 's last row, we need the products of the last row of Randomly generated adjacency structures of 15 dimensions. IP gives better objective values when the adjacency matrix is very sparse.




How a student becomes a teacher: learning and forgetting through Spectral methods

Neural Information Processing Systems

The above scheme proves particularly relevant when the student network is overparameterized (namely, when larger layer sizes are employed) as compared to the underlying teacher network. Under these operating conditions, it is tempting to speculate that the student ability to handle the given task could be eventually stored in a sub-portion of the whole network.


How a student becomes a teacher: learning and forgetting through Spectral methods

Neural Information Processing Systems

The above scheme proves particularly relevant when the student network is overparameterized (namely, when larger layer sizes are employed) as compared to the underlying teacher network. Under these operating conditions, it is tempting to speculate that the student ability to handle the given task could be eventually stored in a sub-portion of the whole network.